Visualization of Binary String Convergence by Sammon Mapping

نویسندگان

  • Richard Dybowski
  • Trevor D. Collins
  • P. R. Weller
چکیده

Understanding the evolution of a complex genetic algorithm is a non-trivial problem, however, genetic-algorithm visualization is in its infancy. This paper reviews some of the current approaches and presents a new visualization approach based on Sammon mapping. Sammon mapping is a nonlinear mapping of a set of vectors in pdimensional space to a set in r-dimensional space, where r < p. The mapping attempts to preserve in r-space the Euclidean inter-vector distances present in p-space. We demonstrate that a Sammon mapping to 2-space of binary chromosomes present in a higher-dimensional allele space during the execution of a genetic algorithm can indicate the presence of multiple solutions. Shortfalls of this approach are discussed along with possible solutions. 1. 0 Introduction A genetic algorithm (GA) is a complex search algorithm intended to randomly sample a problem space and then, through a series of recombination and mutation operations, identify the regions of the problem space containing useful solutions. The user's understanding of a GA's evolution is generally based on the monitoring of the chromosome's fitness values, and this is often presented as a results-per-generation and/or a fitness-versus-time graph (examples of which can be found in Goldberg (1989)). This graphical representation enables the user to observe the mean fitness levels, the spread of fitness ratings, and the rate of improvement. Although such a representation does illustrate the convergence of the consecutive populations' fitness ratings, it illustrates nothing about the content of the individual chromosomes. That is to say, the user can gather no information as to the number of good solution regions being considered, their value, or similarity. The aim of this paper is to explain one approach to producing a visual presentation of a GA's chromosomes so that the information not currently available from the fitness-versus-time graph can be made more apparent. The remainder of this paper is organised as follows. The second section describes some of the existing approaches to GA visualization; the third section explains the approach investigated by us (i.e. Sammon mapping); the fourth section provides some results; and the fifth section presents known shortfalls of this technique and details some of the continuing work being done. 2.0 Current methods of GA visualization Software visualization is "the use of the crafts of typography, graphic design, animation and cinematography with modern human-computer interaction technology to facilitate both the human understanding and effective use of computer software" (Price et al, 1993). Although a fitness-versus-time graph is one of the most common methods for visualizing a GA, it is not the only one. Other graphical displays have been used in varying forms to illustrate individual chromosome fitness ratings within the same generation. These have been examined by Collins (1993) and summarised by Routen and Collins (1993). The use of individual population fitness graphs and histogram plots illustrating each chromosome's fitness rating were examined. Although these views of the GA's data do illustrate the fitness levels within each population, they add little information that cannot be extracted from the fitness-versus-time graph. In fact, because they are ordered by fitness, the views could disguise the discovery of multiple near-optimal solutions. Another approach adopted by Collins (1993) was to present iconic representations of the chromosomes themselves. A simple example is to present the alleles of a chromosome as a row of grey-scaled squares. By translating each chromosome into an iconic image, each population could be displayed and common patterns (i.e. schemata) identified. Although this method presents the individual chromosomes in each population, the amount of computational power required to generate each icon can mean that, for large populations of long chromosomes, this method can become The Proceedings of the Fifth Annual Conference on Evolutionary Programming (EP96), MIT Press. San Diego, CA. Edited by L.J. Fogel, P.J. Angeline, and T. Baeck. pages 377 383. 2 inappropriate for real-time visualization. Furthermore, the amount of information displayed can appear to clutter the screen. The most promising known approach to visualizing the chromosome data is the use of a dataspace metaphor in which each chromosome is represented as a point in twoor three-dimensional space mapped from its position in a higherdimensional allele space. This creates a simple visual image of a GA's exploratory search. Nassersharif et al (1994) proposed this method for two-dimensional problems in which the GA's chromosomes could be displayed directly in a three-dimensional scatterplot. The two problem dimensions of each chromosome were mapped onto the x and y axes, with the corresponding fitness rating being mapped onto the z axis. Although this is a very salient representation, their technique is limited to problems of only two dimensions. Another approach to presenting a dataspace view was proposed by Routen and Collins (1993) in which each chromosome was rated due to its similarity to a base chromosome (for example, a chromosome with all alleles equal to zero). A two-dimensional dataspace plot could then be presented on a fitness-versus-similarity rating scatterplot. The failing of this method is the problem of identifying a suitable similarity rating. 3.0 Sammon mapping Consider n vectors of length p regarded as n vectors in p-space A p . Associated with these are n vectors in A r , where r < p and r ∈ {2,3}. Sammon's nonlinear mapping (Sammon, 1969) provides a subspace depiction of the distribution of the pdimensional vectors which, as much as possible, preserves r-dimensionally the original Euclidean distances between the p-dimensional vectors. This is done by iteratively reducing the disagreement between the p-space inter-vector distances and the corresponding r-space inter-vector distances by means of a steepest-descent optimisation procedure as follows. Let the disagreement (error) between the p-space and r-space inter-vector distances after the m-th iteration, Error〈m〉, be defined by

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multidimensional Scaling for Evolutionary Algorithms - Visualization of the Path through Search Space and Solution Space Using Sammon Mapping

Multidimensional scaling as a technique for the presentation of high-dimensional data with standard visualization techniques is presented. The technique used is often known as Sammon mapping. We explain the mathematical foundations of multidimensional scaling and its robust calculation. We also demonstrate the use of this technique in the area of evolutionary algorithms. First, we present the v...

متن کامل

On global self-organizing maps

Self-Organizing Feature-Mapping (SOFM) algorithm is frequently used for visualization of high-dimensional (input) data in a lower-dimensional (target) space. This algorithm is based on adaptation of parameters in local neighborhoods and therefore does not lead to the best global visualization of the input space data clusters. SOFM is compared here with alternative methods of global visualizatio...

متن کامل

ViSOM - a novel method for multivariate data projection and structure visualization

When used for visualization of high-dimensional data, the self-organizing map (SOM) requires a coloring scheme, such as the U-matrix, to mark the distances between neurons. Even so, the structures of the data clusters may not be apparent and their shapes are often distorted. In this paper, a visualization-induced SOM (ViSOM) is proposed to overcome these shortcomings. The algorithm constrains a...

متن کامل

Dimension Reduction and Data Visualization Using Neural Networks

The problem of visual presentation of multidimensional data is discussed. The projection methods for dimension reduction are reviewed. The chapter deals with the artificial neural networks that may be used for reducing dimension and data visualization, too. The stress is put on combining the selforganizing map (SOM) and Sammon mapping and on the neural network for Sammon’s mapping SAMANN. Large...

متن کامل

An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies

We present a novel method for the visualization of speakers which is microphone independent. To solve the problem of lacking microphone independency we present two methods to reduce the influence of the recording conditions on the visualization. The first one is a registration of maps created from identical speakers recorded under different conditions, i.e., different microphones and distances ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996